On Lloyd’s k-means Method∗
نویسندگان
چکیده
We present polynomial upper and lower bounds on the number of iterations performed by Lloyd’s method for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers. Surprisingly, our construction spread is polynomial. This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations. Furthermore, we present two alternative algorithms, with guaranteed performances, which are simple variants of Lloyd’s method. Results of our experimental studies on these algorithms are also presented.
منابع مشابه
Hartigan's K-Means Versus Lloyd's K-Means - Is It Time for a Change?
Hartigan’s method for k-means clustering holds several potential advantages compared to the classical and prevalent optimization heuristic known as Lloyd’s algorithm. E.g., it was recently shown that the set of local minima of Hartigan’s algorithm is a subset of those of Lloyd’s method. We develop a closed-form expression that allows to establish Hartigan’s method for k-means clustering with an...
متن کاملOn Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice
We provide new analyses of Lloyd’s algorithm (1982), commonly known as the k-means clustering algorithm. Kumar and Kannan (2010) showed that running k-SVD followed by a constant approximation k-means algorithm, and then Lloyd’s algorithm, will correctly cluster nearly all of the dataset with respect to the optimal clustering, provided the dataset satisfies a deterministic clusterability assumpt...
متن کاملAccelerating Lloyd’s Algorithm for k-Means Clustering
The k-means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd’s algorithm is the standard batch, hill-climbing approach for minimizing the k-means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster center...
متن کاملHartigan's Method: k-means Clustering without Voronoi
Hartigan’s method for k-means clustering is the following greedy heuristic: select a point, and optimally reassign it. This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition. A characterization of the volume of this separation is provide...
متن کاملFurther heuristics for $k$-means: The merge-and-split heuristic and the $(k, l)$-means
The k-means clustering problem asks to partition the data into k clusters so as to minimize the sum of the squared Euclidean distances of the data points to their closest cluster center. Finding the optimal k-means clustering of a d-dimensional data set is NP-hard in general and many heuristics have been designed for minimizing monotonically the k-means objective function. Those heuristics got ...
متن کامل